Abstract
We present a novel approach for learning a finite mixture model on a Riemannian manifold in which Euclidean metrics are not applicable and one needs to resort to geodesic distances consistent with the manifold geometry. For this purpose, we draw inspiration on a variant of the expectation-maximization algorithm, that uses a minimum message length criterion to automatically estimate the optimal number of components from multivariate data lying on an Euclidean space. In order to use this approach on Riemannian manifolds, we propose a formulation in which each component is defined on a different tangent space, thus avoiding the problems associated with the loss of accuracy produced when linearizing the manifold with a single tangent space. Our approach can be applied to any type of manifold for which it is possible to estimate its tangent space. Additionally, we consider using shrinkage covariance estimation to improve the robustness of the method, especially when dealing with very sparsely distributed samples. We evaluate the approach on a number of situations, going from data clustering on manifolds to combining pose and kinematics of articulated bodies for 3D human pose tracking. In all cases, we demonstrate remarkable improvement compared to several chosen baselines.
Similar content being viewed by others
Notes
Since vMF-distributions are not directly applicable to predicting velocities on the hypersphere we do not include them in this experiment.
References
Andriluka, M., Roth, S., & Schiele, B. (2010). Monocular 3D pose estimation and tracking by detection. In IEEE Conference on Computer Vision and Pattern Recognition.
Archambeau, C., & Verleysen, M. (2005). Manifold constrained finite gaussian mixtures. In: Computational Intelligence and Bioinspired Systems (pp. 820–828). Berlin: Springer.
Banerjee, A., Dhillon, I. S., Ghosh, J., Sra, S., & Ridgeway, G. (2005). Clustering on the unit hypersphere using von Mises–Fisher distributions. Journal of Machine Learning Research, 6(9), 1345–1382.
Boothby, W. M. (2003). An introduction to differentiable manifolds and riemannian geometry (2nd ed.). New York: Academic Press.
Brand, M. (2003). Charting a manifold. In: Neural Information Processing Systems (pp. 961–968).
Brubaker, M. A., Salzmann, M., & Urtasun, R. (2012). A family of MCMC methods on implicitly defined manifolds. Journal of Machine Learning Research, 22, 161–172.
do Carmo, M. P. (1992). Riemannian geometry. Boston: Birkhäuser.
Caseiro, R., Martins, P., Henriques, J. F., & Batista, J. (2012). A nonparametric riemannian framework on tensor field with application to foreground segmentation. Pattern Recognition, 45(11), 3997–4017.
Caseiro, R., Martins, P., Henriques, J. F., Leite, F. S., & Batista, J. (2013). Rolling riemannian manifolds to solve the multi-class classification problem. In IEEE Conference on Computer Vision and Pattern Recognition.
Chang, J., & Fisher III, J. W. (2013). Parallel sampling of dp mixture models using sub-cluster splits. In: Neural Information Processing Systems (pp. 620–628).
Chen, Y., Wiesel, A., Eldar, Y., & Hero, A. (2010). Shrinkage algorithms for mmse covariance estimation. IEEE Transactions on Signal Processing, 58(10), 5016–5029.
Darling, R. (1996). Martingales on noncompact manifolds: Maximal inequalities and prescribed limits. Annales de l’IHP Probabilités et statistiques, 32(4), 431–454.
Davis, B. C., Bullitt, E., Fletcher, P. T., & Joshi, S. (2007). Population shape regression from random design data. In: International Conference on Computer Vision.
Dedieu, J. P., & Nowicki, D. (2005). Symplectic methods for the approximation of the exponential map and the newton iteration on riemannian submanifolds. Journal of Complexity, 21(4), 487–501.
Dempster, A. P., Laird, N. M., & Rubin, D. B. (1977). Maximum likelihood from incomplete data via the em algorithm. Journal of the Royal Statistical Society: Series B (Methodological), 1–38.
Deutscher, J., & Reid, I. (2005). Articulated body motion capture by stochastic search. International Journal of Computer Vision, 61(2), 185–205.
Figueiredo, M., & Jain, A. (2002). Unsupervised learning of finite mixture models. IEEE Transactions Pattern Analylis and Machine Intelligence, 24(3), 381–396.
Fletcher, P., Lu, C., Pizer, S., & Joshi, S. (2004). Principal geodesic analysis for the study of nonlinear statistics of shape. IEEE Transactions on Medical Imaging, 23(8), 995–1005.
Freifeld, O., & Black, M. J. (2012). Lie bodies: A manifold representation of 3D human shape. In: European Conference on Computer Vision.
Gall, J., Rosenhahn, B., Brox, T., & Seidel, H. P. (2010). Optimization and filtering for human motion capture. International Journal of Computer Vision, 87, 75–92.
Harandi, M., Sanderson, C., Hartley, R., & Lovell, B. (2012). Sparse coding and dictionary learning for symmetric positive definite matrices: A kernel approach. In: European Conference on Computer Vision.
Harandi, M. T., Salzmann, M., & Hartley, R. (2014). From manifold to manifold: Geometry-aware dimensionality reduction for spd matrices. In: European Conference on Computer Vision.
Hauberg, S., Sommer, S., & Pedersen, K. S. (2012). Natural metrics and least-committed priors for articulated tracking. Image and Vision Computing, 30(6), 453–461.
Huckemann, S., Hotz, T., & Munk, A. (2010). Intrinsic shape analysis: Geodesic PCA for Riemannian manifolds modulo isometric lie group actions. Statistica Sinica, 20, 1–100.
Ionescu, C., Li, F., & Sminchisescu, C. (2011). Latent structured models for human pose estimation. In: International Conference on Computer Vision.
Ionescu, C., Papava, D., Olaru, V., & Sminchisescu, C. (2014). Human3.6m: Large scale datasets and predictive methods for 3d human sensing in natural environments. IEEE Transactions Pattern Analylis and Machine Intelligence, 36(7), 1325–1339.
Jain, S., & Govindu, V. (2013). Efficient higher-order clustering on the grassmann manifold. In: International Conference on Computer Vision.
Jayasumana, S., Hartley, R., Salzmann, M., Li, H., & Harandi, M. (2013). Kernel methods on the riemannian manifold of symmetric positive definite matrices. In: IEEE Conference on Computer Vision and Pattern Recognition.
Jayasumana, S., Hartley, R., Salzmann, M., Li, H., & Harandi, M. (2015). Kernel methods on Riemannian manifolds with Gaussian RBF Kernels. In: IEEE Transactions Pattern Analylis and Machine Intelligence.
Karcher, H. (1977). Riemannian center of mass and mollifier smoothing. Communications on Pure and Applied Mathematics, 30(5), 509–541.
Lawrence, N. D. (2005). Probabilistic non-linear principal component analysis with gaussian process latent variable models. Journal of Machine Learning Research, 6, 1783–1816.
Lawrence, N. D., & Moore, A. J. (2007). Hierarchical Gaussian process latent variable models. In: International Conference in Machine Learning.
Ledoit, O., & Wolf, M. (2004). A well-conditioned estimator for large-dimensional covariance matrices. Journal of Multivariate Analysis, 88(2), 365–411.
Ledoit, O., & Wolf, M. (2011). Nonlinear shrinkage estimation of large-dimensional covariance matrices. Institute for Empirical Research in Economics University of Zurich Working Paper (515).
Lenglet, C., Rousson, M., Deriche, R., & Faugeras, O. (2006). Statistics on the manifold of multivariate normal distributions: Theory and application to diffusion tensor mri processing. Journal of Mathematical Imaging and Vision, 25(3), 423–444.
Li, R., Tian, T. P., Sclaroff, S., & Yang, M. H. (2010). 3D human motion tracking with a coordinated mixture of factor analyzers. International Journal of Computer Vision, 87(1–2), 170–190.
Moeslund, T. B., & Granum, E. (2001). A survey of computer vision-based human motion capture. Computer Vision and Image Understanding, 81(3), 231–268.
Moeslund, T. B., Hilton, A., & Krüger, V. (2006). A survey of advances in vision-based human motion capture and analysis. Computer Vision and Image Understanding, 104, 90–126.
Muralidharan, P., & Fletcher, P. T. (2012). Sasaki metrics for analysis of longitudinal data on manifolds. In: IEEE Conference on Computer Vision and Pattern Recognition.
Ozakin, A., & Gray, A. (2009). Submanifold density estimation. In: Neural Information Processing Systems (pp. 1375–1382).
Pelletier, B. (2005). Kernel density estimation on Riemannian manifolds. Statistics & Probability Letters, 73(3), 297–304.
Pennec, X. (2006). Intrinsic statistics on Riemannian manifolds: Basic tools for geometric measurements. Journal of Mathematical Imaging and Vision, 25(1), 127–154.
Pennec, X. (2009). Statistical computing on manifolds: From riemannian geometry to computational anatomy. In: Emerging Trends in Visual Computing (pp. 347–386). Berlin: Springer.
Pennec, X., Fillard, P., & Ayache, N. (2006). A Riemannian framework for tensor computing. International Journal of Computer Vision, 66(1), 41–66.
Quiñonero-candela, J., Rasmussen, C. E., & Herbrich, R. (2005). A unifying view of sparse approximate Gaussian process regression. Journal of Machine Learning Research, 6, 1939–1959.
Said, S., Courtry, N., Bihan, N. L., & Sangwine, S. (2007). Exact principal geodesic analysis for data on \(SO(3)\). In: European Signal Processing Conference.
Sanin, A., Sanderson, C., Harandi, M., & Lovell, B. (2012). K-tangent spaces on riemannian manifolds for improved pedestrian detection. In: International Conference on Image Processing.
Sasaki, S. (1958). On the differential geometry of tangent bundles of riewannian manifolds. Tohoku Mathematical Journal, Second Series, 10(3), 338–354.
Schäfer, J., & Strimmer, K. (2005). A shrinkage approach to large-scale covariance matrix estimation and implications for functional genomics. Statistical Applications in Genetics and Molecular Biology, 4(1), 32.
Shirazi, S., Harandi, M., Sanderson, C., Alavi, A., & Lovell, B. (2012). Clustering on grassmann manifolds via kernel embedding with application to action analysis. In: International Conference on Image Processing.
Sigal, L., Bhatia, S., Roth, S., Black, M., & Isard, M. (2004). Tracking loose-limbed people. In: IEEE Conference on Computer Vision and Pattern Recognition.
Sigal, L., Isard, M., Haussecker, H. W., & Black, M. J. (2012). Loose-limbed people: Estimating 3D human pose and motion using non-parametric belief propagation. International Journal of Computer Vision, 98(1), 15–48.
Simo-Serra, E., Quattoni, A., Torras, C., & Moreno-Noguer, F. (2013). A joint model for 2D and 3D pose estimation from a single image. In: IEEE Conference on Computer Vision and Pattern Recognition.
Simo-Serra, E., Ramisa, A., Alenyà, G., Torras, C., & Moreno-Noguer, F. (2012). Single image 3D human pose estimation from noisy observations. In: IEEE Conference on Computer Vision and Pattern Recognition.
Simo-Serra, E., Torras, C., & Moreno-Noguer, F. (2014). Geodesic finite mixture models. In: British Machine Vision Conference.
Simo-Serra, E., Torras, C., & Moreno-Noguer, F. (2015). Lie algebra-based kinematic prior for 3D human pose tracking. In: International Conference on Machine Vision Applications.
Sivalingam, R., Boley, D., Morellas, V., & Papanikolopoulos, N. (2010). Tensor sparse coding for region covariances. In: European Conference on Computer Vision.
Sminchisescu, C., & Triggs, B. (2003). Estimating articulated human motion with covariance scaled sampling. International Journal of Robotics Research, 22(6), 371–391. Special issue on Visual Analysis of Human Movement.
Sommer, S. (2015). Anisotropic distributions on manifolds: Template estimation and most probable paths. In: Information Processing in Medical Imaging. Lecture Notes in Computer Science. Berlin: Springer.
Sommer, S., Lauze, F., Hauberg, S., & Nielsen, M. (2010). Manifold valued statistics, exact principal geodesic analysis and the effect of linear approximations. In: European Conference on Computer Vision.
Sommer, S., Lauze, F., & Nielsen, M. (2014). Optimization over geodesics for exact principal geodesic analysis. Advances in Computational Mathematics, 40(2), 283–313.
Straub, J., Chang, J., Freifeld, O., & Fisher III, J. W. (2015). A dirichlet process mixture model for spherical data. In: International Conference on Artificial Intelligence and Statistics.
Taylor, G., Sigal, L., Fleet, D., & Hinton, G. (2010). Dynamical binary latent variable models for 3d human pose tracking. In: IEEE Conference on Computer Vision and Pattern Recognition.
Tosato, D., Farenzena, M., Cristani, M., Spera, M., & Murino, V. (2010). Multi-class classification on riemannian manifolds for video surveillance. In: European Conference on Computer Vision (pp. 378–391).
Tosato, D., Spera, M., Cristani, M., & Murino, V. (2013). Characterizing humans on riemannian manifolds. IEEE Transactions Pattern Analylis and Machine Intelligence, 35(8), 1972–1984.
Tournier, M., Wu, X., Courty, N., Arnaud, E., & Reveret, L. (2009). Motion compression using principal geodesics analysis. Computer Graphics Forum, 28(2), 355–364.
Turaga, P., Veeraraghavan, A., Srivastava, A., & Chellappa, R. (2011). Statistical computations on grassmann and stiefel manifolds for image and video-based recognition. IEEE Transactions Pattern Analylis and Machine Intelligence, 33(11), 2273–2286.
Tuzel, O., Porikli, F., & Meer, P. (2008). Pedestrian detection via classification on Riemannian manifolds. IEEE Transactions Pattern Analylis and Machine Intelligence, 30(10), 1713–1727.
Urtasun, R., Fleet, D. J., & Fua, P. (2006). 3D people tracking with gaussian process dynamical models. In: IEEE Conference on Computer Vision and Pattern Recognition.
Urtasun, R., Fleet, D. J., & Lawrence, N. D. (2007). Modeling human locomotion with topologically constrained latent variable models. In: Proceedings of the 2nd Conference on Human Motion: Understanding, Modeling, Capture and Animation.
Varol, A., Salzmann, M., Fua, P., & Urtasun, R. (2012). A constrained latent variable model. In: IEEE Conference on Computer Vision and Pattern Recognition.
Wallace, C. S., & Freeman, P. R. (1987). Estimation and inference by compact coding. Journal of the Royal Statistical Society: Series B (Methodological), 240–265.
Wang, J., Fleet, D., & Hertzmann, A. (2005). Gaussian process dynamical models. In: Neural Information Processing Systems.
Yao, A., Gall, J., Gool, L. V., & Urtasun, R. (2011). Learning probabilistic non-linear latent variable models for tracking complex activities. In: Neural Information Processing Systems.
Zhang, M., & Fletcher, P. T. (2013). Probabilistic principal geodesic analysis. In: Neural Information Processing Systems (pp. 1178–1186).
Acknowledgments
We would like to thank the three anonymous reviewers for their insights and comments that have significantly contributed to improving this manuscript. This work was partly funded by the Spanish MINECO project RobInstruct TIN2014-58178-R and by the ERA-net CHISTERA project I-DRESS PCIN-2015-147.
Author information
Authors and Affiliations
Corresponding author
Additional information
Communicated by Hiroshi Ishikawa, Takeshi Masuda, Yasuyo Kita and Katsushi Ikeuchi.
Appendices
Appendix 1: Derivation of Mixture Models on Riemannian Manifolds
We follow the standard expectation-maximization approach to maximize the log-likelihood of our model adapting it to Riemannian manifolds. For simplicity, we will not consider the Minimum Message Length criteria for model selection. We start out by defining the log-likelihood of the model \(\lambda (x,\theta )\) and bounding it by Jensen’s equality:
with \(w_k^{(i)}\) as auxiliary variables that represent membership probabilities. We can maximize over the lower bound \(B(x,\theta )\) instead of the untractable full likelihood.
1.1 E-step
The E-step consists of maximizing the auxiliary terms \(w_k^{(i)}\) which are the membership probabilities of the samples. This is done by solving:
This is straight forward to do by computing the derivative and equating it to 0 to obtain the update rule for step t:
1.2 M-step
In this step, we have fixed w and are updating the other parameters \(\theta =(\mu ,\varSigma )\) and \(\alpha \) by solving:
We shall follow the same approach as in the E-step and compute the partial derivatives to obtain the update rules. In particular, both \(\alpha \) and \(\varSigma \) are straight forward to compute, and do not significantly deviate from the standard formulation. Thus for \(\alpha \) we obtain:
and for \(\varSigma \):
and thus,
For the mean \(\mu _k\) we can follow the same approach, however, due to the logarithmic map, it is slightly different to resolve. We start out by computing the partial derivative:
In general, there is no analytic solution to \(\frac{\partial \log _{\mu _k}(x^{(i)})}{\partial \mu _k}\). However, under the assumption that \(\frac{\partial \log _{\mu _k}(x^{(i)})}{\partial \mu _k} = c\) where \(c \ne 0\) is a constant, and equating the partial derivative to 0, we can obtain:
For simply connected and complete manifolds whose curvature is non-positive (i.e., Hadamard manifolds) and bounded from below, there exists one and only one Riemannian center of mass which is characterized by \(E[\log _{\mu }(x)]=0\) Darling (1996). Note that a compact and simply connected manifold with a non-positive and bounded from below curvature has no cut locus. In this case, as Eq. (45) is the discrete expectation of the weighted sum, we can establish the update rule for the mean by:
Note that this does not hold for the case in which there is a cut locus, in which case there may not be only one Riemannian center of mass. However, in practice, this approach will generally converge to the center of mass. We will use Eq. (46) in all cases.
Finally, we perform a numerical analysis of the error for the \(S^2\) sphere by numerically computing \(\frac{\partial \log _{\mu _k}(x^{(i)})}{\partial \mu _k}\) and visualizing the results. In particular we visualize the change of Frobenius norm of the Jacobian in Fig. 10. We can see that points near the origin have very little change in the derivative. Again, the use of multiple tangent planes favors configurations in which the points are close to the center, and thus, keeps the error produced by approximating the partial derivative to a constant within reasonable bounds.
Appendix 2: Clustering with Von Mises–Fisher Distributions
Given a random vector x on the unit hypersphere of dimension \(q-1\), the probability density function of a von Mises–Fisher distribution with mean direction \(\mu \) and concentration \(\kappa \) can be written as:
where \(\Vert \mu \Vert =1\), and \(I_{q/2-1}\) is the modified Bessel function of first kind and order \(q/2-1\). Note that the concentration parameter \(\kappa \) is a single scalar that represents a uniform distribution on the sphere for \(\kappa =0\) and is unimodal for \(\kappa >0\).
The algorithm from Figueiredo and Jain (2002) can be modified to use von Mises–Fisher distributions by adapting the way the distributions are recalculated in the M-step. This can be computed by:
As there exists no analytic form of \(I_{q/2}(\kappa _k(t))/I_{q/2-1}(\kappa _k(t))=r_k\), the computation of \(\kappa _k(t)\) is indeed an approximation Banerjee et al. (2005).
Rights and permissions
About this article
Cite this article
Simo-Serra, E., Torras, C. & Moreno-Noguer, F. 3D Human Pose Tracking Priors using Geodesic Mixture Models. Int J Comput Vis 122, 388–408 (2017). https://doi.org/10.1007/s11263-016-0941-2
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11263-016-0941-2